Seasonal dynamics in picocyanobacterial abundance and clade composition at coastal and offshore stations in the Baltic Sea

Picocyanobacteria (< 2 µm in diameter) are significant contributors to total phytoplankton biomass. Due to the high diversity within this group, their seasonal dynamics and relationship with environmental parameters, especially in brackish waters, are largely unknown. In this study, the abundance and community composition of phycoerythrin rich picocyanobacteria (PE-SYN) and phycocyanin rich picocyanobacteria (PC-SYN) were monitored at a coastal (K-station) and at an offshore station (LMO; ~ 10 km from land) in the Baltic Sea over three years (2018–2020). Cell abundances of picocyanobacteria correlated positively to temperature and negatively to nitrate (NO3) concentration. While PE-SYN abundance correlated to the presence of nitrogen fixers, PC-SYN abundance was linked to stratification/shallow waters. The picocyanobacterial targeted amplicon sequencing revealed an unprecedented diversity of 2169 picocyanobacterial amplicons sequence variants (ASVs). A unique assemblage of distinct picocyanobacterial clades across seasons was identified. Clade A/B dominated the picocyanobacterial community, except during summer when low NO3, high phosphate (PO4) concentrations and warm temperatures promoted S5.2 dominance. This study, providing multiyear data, links picocyanobacterial populations to environmental parameters. The difference in the response of the two functional groups and clades underscore the need for further high-resolution studies to understand their role in the ecosystem.


Results
Environmental conditions. Sampling was carried out every second week at the Linnaeus Microbial Observatory (LMO), an offshore station and weekly at the K-station, a coastal station during 2018-2020. Seawater temperature ranged from 0 to 5 °C during winter at both stations and increased during spring (~ 3-15 °C) and summer (~ 15-20 °C; Fig. 1A). In 2018, there was an earlier increase of temperature during summer compared to 2019 and 2020 (average 3 °C higher in the beginning of June) and the maximum temperatures during the study period were 24 and 22 °C at the K-station and LMO respectively. At LMO, the increase of temperature during summer resulted in stratification (mixed layer approx. depth: 20 m) that lasted until early to mid-autumn (stratification index (N 2 ): 0.2-0.66 × 10 -3 s −2 ; Supplementary Fig. S2A and B). In 2018, early signs of stratification were observed during spring due to the exceptionally high temperatures. Salinity ranged between 6.5 and 8 PSU (Fig. 1B). At the K-station, salinity increased during the spring to summer period while no clear seasonality was observed at the LMO. Inorganic nitrogen (NO 3 ) concentrations ranged between < 0.06 (detection limit) and 5 µM, with the highest concentrations observed at the K-station during winter, which sharply declined upon temperature increase during spring (Fig. 1C). Phosphate concentrations (PO 4 ) decreased during spring (from 1 to 0.2 µM) and remained low throughout the summer with occasional peaks up to 1 µM at the K-station (Fig. 1D). At the K-station, silicate (SiO 2 ) showed a strong seasonality ranging from 4 to 25 µM with peaks in summer and early autumn (Fig. 1E). Silicate did not show a clear seasonality at LMO. Concentrations ranged between 10 and 20 µM during 2018, sharply declined during spring in 2019 and remained stable around 5 µM throughout 2020.
Phytoplankton dynamics. Chlorophyll a (Chl a) concentration was generally higher at the K-station (1-7 µg L -1 ) compared to LMO (1-4.5 µg L −1 ; Fig. 2A). At the K-station, phytoplankton carbon biomass increased from 0.4 mg C mL −1 in the autumn to winter period up to 50.0 mg C mL −1 in the spring and 147.7 mg C mL −1 in the summer (Fig. 2B). The phytoplankton community was dominated by diatoms during autumn, winter and first half of the spring (Fig. 2C). During the second half of the spring the community transitioned to ciliates and dinoflagellates dominance. N 2 -fixers (filamentous cyanobacteria) were only observed during the summer 2018, reaching 111.3 mg C mL −1 in July. At the LMO, the maximum carbon biomass increased from 5.9 mg C mL −1 in the autumn and winter up to 597.4 mg C mL −1 in the spring and 167.7 mg C mL −1 in the summer (Fig. 2B). During the autumn, winter and spring, the community was dominated by ciliates and dinoflagellates (Fig. 2D), and occasionally by diatoms. During the summer the phytoplankton community was dominated by N 2 -fixers, reaching 111.3 mg C mL −1 total carbon biomass. www.nature.com/scientificreports/ cells mL −1 ) and summer to autumn (up to 2.6 × 10 5 cells mL −1 ) at the K-station and LMO respectively (Fig. 2E). PC-SYN increased from low concentrations in the winter (K-station: 72 cells mL −1 , LMO: 8 cells mL −1 ) to peak concentrations during spring and summer (K-station: 2.1 × 10 5 cells mL −1 , LMO: 4.8 × 10 3 cells mL −1 ; Fig. 2F).
To analyze the effect of each independent variable on PE-SYN and PC-SYN abundance, we used the partial least square regression (PLS) method to establish a regression equation. The number of components utilized for each equation was 2 and 6 for PE-SYN and PC-SYN respectively. Only the independent variables with variable importance in projection (VIP) > 1 were considered relevant for the final equation (Table 1). The model for PE-SYN explained 38% of the cell abundance seasonal changes with 52% variation of the independent variables. The regression equation between PE-SYN cell abundance and the independent variables was (1): The model for PC-SYN explained 40% of the cell abundance seasonal changes with 46% variation of the independent variables. The regression equation between PC-SYN cell abundance and the independent variables was (2): (2) y = −0.28 · NO 3 (µM) + 0.13 · T C • + 0.35 · N 2 (10 −3 s −2 )  (maximum contribution 2019-08-13: 91%), followed by an increase in KS3 at the end of the summer to beginning of autumn (maximum contribution 2019-10-08: 68%). At the LMO the period of late summer to early autumn was occasionally dominated by S5.2 (maximum contribution 2018-09-11: 68%). The picocyanobacterial community was more diverse at the coastal K-station compared to the offshore LMO (K-station registered 1026 ASVs and LMO registered 195 ASVs). The phylogenetic classification (which only includes ASVs that represented > 1% of the sequences for at least one sampling point) showed that some sequences that belong to the clades KS2 (ASV00070), KS5 (ASV07850) KS6 (ASV00066) and S5.2 (ASV07854 and ASV07849) were only present at the coastal K-station. The number of ASVs was consistently higher at the K-station, with the highest numbers of ASVs present during autumn to early spring (Fig. 3).
Correlations between environmental and biological variables with picocyanobacterial composition. The test on Procrustes analysis and the CAs revealed a significant association between the distribution of the phylogenetic clades and the ASVs (squared m12 P value < 0.001, Supplementary Fig. S4). The stepwise CCA showed a significant relationship between clade composition and eight environmental and biological variables: NO 3 (µM), SiO 2 (µM), PO 4 (µM), temperature (°C), salinity (PSU), N 2 -fixers biomass (mg C mL −1 ), PE-SYN (cells mL −1 ) and PC-SYN (cells mL −1 ; Table 2). The first two axis explained 49% of the taxonomic composition (Fig. 4). The first axis indicated a clear separation of summer samples. Summer samples were associated  www.nature.com/scientificreports/ with high abundance of PE-SYN and PC-SYN, increase of temperature and high PO 4 . Autumn, winter and spring sites from the model clustered together and were associated with high NO 3 . PC-SYN abundance and temperature were the most important variables as indicated by the length of the variable arrows (Fig. 4).

Discussion
This study shows that temperature was one of the main parameters driving picocyanobacterial abundance. The correlation between temperature and increase in picocyanobacterial abundance is well known in marine 8,9 , estuarine 63,64 and freshwater systems 65,66 . Picocyanobacterial abundance started increasing at > 10 °C, in line with other temperate ecosystems (e.g. 67 ) and previous records in the Baltic Sea 9 . Peak abundances at the coastal K-station and offshore LMO during 2019 and 2020 were in the same range, with the exception of the summer of 2018, when K-station peak abundances reached 4.7 × 10 5 cells mL −1 . These numbers are comparable to other observations in the Baltic Proper during summer (10 5 cells mL −1 ), and confirm previous observations that picocyanobacterial abundances are as high at coastal and offshore locations 55,57,68 . It is also important to note that the picocyanobacterial abundances at < 10 °C reported in this study were notably higher than previous reports in the Gulf of Finland and in other temperate ecosystems 8,9 . The thermal bound for SYN has recently been defined as > 5 °C 18 . However, in this study abundances of > 10 4 cells mL −1 were recorded during winter time at both K-station (0-5.7 °C) and LMO (2.7-5.8 °C), suggesting that the strains present in the Baltic Sea are well adapted to low temperatures, in line with previous observations in marine 69 and freshwater environments 70 . According to the PLS models independent variables only explained 38 and 40% of the total variation of PE-SYN and PC-SYN respectively. This indicates that other important controllers such as light quality 4 , grazing  www.nature.com/scientificreports/ by ciliates and flagellates 71 or viral lysis 72 may be important to include in future models. SYN was divided into PE-SYN and PC-SYN depending of the pigment content. PE-SYN and PC-SYN coexisted at similar abundances during the summer, confirming previous observations based on cpcBA and cpeBA libraries 29 . PE-SYN are better adapted to blue light which can penetrate deeper in the mixing layer, while PC-SYN are adapted to red light, which is dissipated at the surface 4 . As a result, PE-SYN showed no correlation with the stratification index (N 2 ) and was equally prevalent at both the K-station and the LMO. On the other hand, PC-SYN abundance variation was strongly linked to N 2 and was more prevalent at the coastal K-station compared to the offshore LMO, in line with observations in other estuaries and freshwater lakes 11,73 . These results suggest a horizontal gradient in the Baltic Sea, where PC-SYN is more prevalent in coastal shallow areas while in offshore areas abundances are lower and tightly joined to stratification. In the future, an increase in the stratification periods as a result of global warming could reinforce PC-SYN dominance on the picocyanobacterial community. PC-SYN have been observed to avoid predation, viral lysis and to have a negative effect on co-occurring filter-feeders, which can affect the energy flow to upper trophic layers [74][75][76] . Thus, understanding the physiology and ecology of PC-SYN, a generally understudied group of SYN, is of high importance for the understanding of current and future climate scenarios. Nutrient availability, particularly nitrogen species, was correlated to picocyanobacterial dynamics according to the PLS models. Both PE-SYN and PC-SYN showed moderate negative correlation with NO 3 concentration. Some studies have suggested preference of picocyanobacteria for NH 4 over NO 3 at high temperature conditions (> 15 °C) 55,77 . Thus, newly fixed nitrogen in the form of NH 4 from N 2 -fixers or regeneration during blooms may be an important driver supporting picocyanobacterial growth during summer [31][32][33] , in line with the positive correlation between N 2 -fixers and PE-SYN. However, PE-SYN peak abundances at the K-station and LMO were in the same range, although N 2 -fixers were only observed at the K-station during 2018. This suggests that PE-SYN can benefit from newly fixed nitrogen, but the presence of N 2 -fixers is not a requirement to achieve peak abundances and thus other nitrogen pools should also be considered. The main nitrogen source for picocyanobacteria during the summer may be regeneration through ammonification 34 . This may be especially important in coastal and shallow water areas (< 50 m depth) where studies have estimated that it can represent up to 97% of the nitrogen requirements 33,78 . In addition, peak abundances at the LMO were sustained during the first half of the autumn at < 10 °C, which indicates that picocyanobacteria can uptake NO 3 at low concentrations efficiently 52 . Direct uptake of DON may also benefit SYN over other phytoplankton [35][36][37] .
The community composition was studied using 16S rRNA gene sequences amplified using specific primers that target almost exclusively picocyanobacteria 62 . The results corroborate that the V5-V7 region of the 16S rRNA gene showcases higher variability in picocyanobacteria than the V3-V4 region, revealing an unprecendented high strain diversity in the Baltic Sea with a particularly high number of ASVs at the coastal K-station. Most of the previously defined clades and clusters were described 62 , but some clades were not well resolved. For example, clades A and B clustered together (clade A/B) contrasting with phylogeny based on other regions of the 16S rRNA gene 12,79 . All ASVs in clade A/B displayed similar seasonal variation in relative abundance, suggesting a similar ecophysiology whithin the clade. The increase in contribution of S5.2 took place in June-July, when temperature was > 18 °C and NO 3 concentration was low. This result suggests that S5.2 affiliated picocyanobacterial strains are adapted to high temperatures and may use NH 4 as a primary nitrogen source 33,55,77 . On the other hand, clade A/B dominated during the colder months, indicating that picocyanobacteria strains in this clade have a preference for low temperatures and high NO 3 concentration.
This study indicates a coastal to offshore differentiation in picocyanobacterial community composition. The coastal K-station presented higher ASV diversity than the offshore LMO. Moreover, the clades KS2, KS5 and KS6 were only present at the coastal K-station, which suggest that some clades are only present in the coastal region. These results contradict previous observations in the Baltic Proper where no significant differences in community composition were observed in coastal offshore gradients 6 . One explanation is that the higher resolution achieved with the primers in this study have revealed differences that could not be detected with less specific primers. Community composition seasonal dynamics in the coastal and offshore stations also showed major differences. At the K-station picocyanobacterial peak abundances took place when S5.2 was dominating the community while at the LMO, peak abundances took place under clade A/B dominance. The composition at the coastal compared to the offshore station could be driven by low PO 4 levels as S5.2 was positively linked to PO 4 concentration. At the LMO, PO 4 was low during summer (mean summer concentration: 0.17 µM) while it remained high at the K-station (mean summer concentration: 0.63 µM), explaining the lower contribution of S5.2 in the LMO. The potential PO 4 limitation could also explain why at the LMO S5.2 has lower contributions during summer to the community composition compared to the K-station. However, to fully understand picocyanobacterial dynamics, other parameters such as light hours 80 , NH 4 recycling rates 33 or nutrient competition with specific phytoplankton groups 77 should also be considered.
The most abundant ASV in the dataset, ASV_00001, was identical to the V5-V7 region of a metagenomeassembled genome (MAG) reconstructed from the Baltic Proper (BACL30) and has been identified as dominant in the Baltic Sea 16,60 . Phylogenetic classification based on six ribosomal proteins included BACL30 in the S5.2 clade 16 ; however this classification may have been biased by the lack of genome sequenced freshwater SYN strains. In this study, ASV_00001 had 99% identity with the strain MW73D5, a freshwater strain included in clade A/ B 12 . Most of the ASVs of the picocyanobacterial community in the Baltic Sea were more similar to freshwater strains rather than estuarine and marine strains. Hugerth et al. 60 studied bacterial MAG phylogeny (including BACL30 as the only representative for SYN) from brackish environments and observed that a significant proportion of the MAGs had closest affiliation to strains of brackish origin. The affiliation of ASV_00001 to clade A/B (freshwater) suggests that Baltic Sea picocyanobacteria have evolved from freshwater strains. ASV_00001 showed high contributions during the summer, particularly at the LMO (maximum relative contribution: 78%), www.nature.com/scientificreports/ in line with observations in other offshore locations 15,16 . However, the highest contributions took place in the cold months, indicating that BACL30 is well adapted to low temperature conditions. This study provides a detailed description of picocyanobacterial seasonal abundance and community composition during three years at a coastal and an offshore station in the Baltic Proper, showing that SYN are highly adaptable and diverse. The results link SYN populations to environmental parameters. The Baltic Sea is warming up faster than other oceans 81 , and it is predicted that metereological summer in the Baltic Proper will take place 20 days earlier by 2100 82 . Longer and warmer summers could result in earlier picocyanobacterial blooms in the coast as a consequence of achieving optimal temperatures for S5.2 ecotypes earlier in the spring/summer season. This effect could be further magnified by earlier and more extensive blooms of N 2 -fixing cyanobacteria resulting in higher NH 4 availability, which are projected as a consequence of global warming 82,83 . However, at offshore locations in the Baltic Proper, the picocyanobacterial summer bloom could be delayed since optimal temperature for clade A/B would take place later in the year. In future climate conditions, the contribution of picocyanobacteria to the carbon pools is expected to increase while bigger phytoplankton (PPE and other protists) are expected to decrease 84,85 . These changes in community composition together with potential changes in SYN peak abundances timing could result in significant alterations of picophytoplankton grazers, and consequently the rest of the trophic chain 86 . The results in this study highlight that besides temperature, water column stratification and nutrient availability also play an important role in picocyanobacterial dynamics and community composition.

Methods.
Field sampling. Water was sampled every second week at the Linnaeus Microbial Observatory (LMO, 56° 55′ 51.24″ N, 17° 3′ 38.52″ E, 2 m sampling depth), an offshore station located 10 km off the east coast of Öland (Sweden) and weekly at the K-station, a coastal station located in the city of Kalmar, Sweden (56° 39′ 25.4″ N 16° 21′ 36.6″ E, 1 m sampling depth) during 2018-2020. The temperature and salinity were measured using a conductivity/temperature/depth sensor CTD® Castaway at the K-station and a CTD probe (AAQ 1186-H, Alec Electronics, Japan) at the LMO. To remove large particles, samples were filtered through a 200 µm mesh.
Nutrients and Chl a. Water for measuring dissolved inorganic nutrients (NH 4 , NO 2 + NO 3 (referred to as NO 3 ), PO 4 and SiO 2 ,) was sampled and frozen at − 20 °C until analysis using standard protocols (UV-Spectrophotometer 87 ). For measuring Chl a, 50-200 mL seawater was filtered through duplicate 25 mm A/E glass fiber filters (∼1 μm pore size, Pall life Sciences, Ann Arbor, MI, USA). Filters were incubated overnight in darkness in 5 mL of ethanol (96%) and fluorescence was measured the following day using a fluorometer (Turner design Model #040, Tucson, USA) following the Jespersen & Christoffersen 88 protocol.
Phytoplankton abundance and community composition. Samples for phytoplankton identification (> 5 µm diameter) were fixed in acid lugol (1% final concentration) and counted according to the Utermöhl method 89 using an inverted microscope (Nikon TMS, Tokyo, Japan). The phytoplankton carbon biomass concentration (mg C mL −1 ) was derived from the cell abundance and carbon biomass 89,90 . In addition to the traditional taxonomy the genus Aphanizomenon, Nodularia and Dolichospermum were included in the group defined as N 2 -fixers.
Picophytoplankton abundance. Samples to determine picophytoplankton abundance were fixed with glutaraldehyde solution Grade I 25% in H2O (Sigma-Aldrich, Missouri, USA; 1% final concentration) and stored at − 80 °C until analysis. K-station samples until 12th May 2020 and LMO samples until 10th July 2019 were analyzed using a Cyflow® Cube8 flow cytometer (Partec®, Germany) at 10 µL s −1 while after that date a BD FACSverse (BD Biosciences) was used instead. Picophytoplankton were counted as three populations: photosynthetic picoeukaryotes (PPE), phycoerythrin rich (PE-SYN) SYN cells and phycocyanin rich (PC-SYN) SYN based on the gating described in Alegria Zufia et al. 55  Bioinformatics processing. The resulting reads were denoised and screened for chimera removal with ampliseq (v1.1, https:// github. com/ nf-core/ ampli seq) which runs on QIIME2 (2019.10) 92 and DADA2 (1.10.0) 93 and taxonomy assignment of the resulting amplicon sequencing variants (ASVs) were done using the SILVA 132 database with a 90% identity threshold. Among all sequences in the libraries, 95% belonged to cyanobacteria, of which 97% were classified as Synechococcales. Then, a phylogenetic tree was constructed with the aligned sequences to cross check that all of them indeed affiliated with Synechococcales; sequences that did not affiliate were excluded from further analyses. The number of sequences for each sample after each step of the quality control pipeline are specified in Supplementary Table S1.
Phylogenetic analysis. The ASVs that represented > 1% of the sequences in at least one sampling point were selected. To determine the phylogenetic clade affiliation, the closest representative sequences were identified and retrieved using the BLASTn-Search engine in the NCBI database. Sequences were aligned using MAFFT v.7 94 and a phylogenetic tree was constructed by the maximum likelihood method (ML) in MEGAX software 95 following the GTR + G + I model (bootstrap values inferred from 1000 replicates). ASVs were associated to a specific clade or subgroup when they were located in a (mostly) monophyletic branch that contains a well defined reference sequence associated to that clade. References for clades were obtained from Huber et al. 62 , Crosbie et al. 96 , Silva et al. 97 , Marsan et al. 98 and Choi et al. 45 . Branches that did not associate with previously known clades were given new clade names. The resulting tree was edited using the interactive tree of Life (iTOL, http:// itol. embl. de).

Statistical analysis.
Datasets were rarefied to a sequence depth of 27,819 sequences per sample. All statistical analyses were performed using R version 3.6.1 99 and the vegan package 100 . All plots were produced using the ggplot2 (3.3.6) 101 , and ComplexHeatmap (3.15) 102 .
Partial least square regression. Partial least square regression (PLS) was used to evaluate the picocyanobacterial relationship with the following independent variables: in situ nutrients (NO 3 (µM), PO 4 (µM), SiO 2 (µM)), temperature (°C), salinity (PSU), stratification index (N 2 ; 10 -3 s −2 ), presence of N 2 -fixers (mg C mL −1 ) and total phytoplankton biomass (mg C mL -1 ). The model was performed separately for PE-SYN (cells mL −1 ) and PC-SYN (cells mL −1 ) separately and by pooling together the data from K-station and LMO. All variables were log 10 (x + 1) transformed for standardization. The stratification index was calculated following the Brunt-Väisälä frequency (3) 103 : where g is the gravitational acceleration (m s −2 ), ρ is the average density of seawater (1.025 g cm −3 ), ρ s is the surface density (g cm −3 ) and ρ b is the density at the bottom (g cm −3 ). The K-station is 3 m deep, and thus all the water column is permanently in the photic zone. However, the calculated N 2 was ~ 0, indicating an unstable water column. Because of this, 1 × 10 -3 s −2 values were assigned to all sampling dates in the K-station, indicating a strong stratification. The variable importance in projection (VIP) was used to determine the relative importance of the independent variables considered 104,105 . Variables were considered significant when VIP > 1.
Procrustes analysis. Procrustes analysis was used to compare correspondence analysis (CA) based on phylogenetic clades and ASVs. The goal of this analysis was to test whether the distribution of the phylogenetic clades was congruent with the distribution of ASVs. The significance of the association of the two datasets was later explored with a squared m12 test (999 permutations, P value < 0.05).

Data availability
The datasets generated and/or analysed during the current study are available in the NCBI repository, BioProject PRJNA810944. (3)